1 Overview

The interest of this report is the characterizing of neural cells during the developing human cortex using the provided Seurat object.



To get insight from your data , I can propose the following analyses:

  • Cell types Identification (Subtyping).
  • Differential Gene Expression Analysis.
  • Trajectory Analysis.


In this analysis, we will perform analysis of single cell data from human cortex cells, we have 5000 cells from 3 batch

I will perfrom seurat worflow analysis after checking if there is any batch effect to correct

2 Data Analysis

2.1 Quality control

As we can see in the Figure 1 (violin plots), the data provided has a good quality (0 % of mitochondrial genes, the minimal number of genes detected in each cell is 501 and the minimal number of molecules detected within a cell is 642)
NB: This data seems to be already filtered so no cut needed.

2.2 Cell types Identification (Subtyping):

After clustering using Seurat to identify (using public and available genes marker genes) all cells population present in your dataset. This will allow us to explore the heterogeneity of neural subtypes during this developmental window (pcw16, pcw20, pcw21 and pcw24).
The potential limitation here is that the provided dataset may not capture the full diversity of cell subtypes, some cells are less abundant. Also, the cluster assignments could be struggling because some gene markers are not robust .

But : when I checked the metadata slot in the provided Seurat object, I saw that the pre-analysis (including clustering and cell populations definition) are already done  as the Figure 2 show we can note 23 cell types identified (with one cluster of aknown cells).
No batch effect was detected in the Figure 3 population providing from several batch are clustering together, that why I will skip the correction step.

2.3 Differential Gene Expression Analysis:

After quality control and cell types identification, we can perform a differential gene expression analysis (DEG) to identify genes that are specifically upregulated in each subtype.
This will provide insights into the molecular characteristics of these subtypes and potentially reveal functional differences.
Differential gene expression analysis is performed to identify genes specifically upregulated in each subtype.
Since the dataset is a subset, which might limit the depth of analysis, I will normalize and scale data before the maker identification step.
The heatmap shows the expression of top 30 markers for each cell population found. We can draw several pieces of information, like for example : the Outer radial glia 1 and 2 seems express the same blocs of genes with a one bloque (1 in the top) overexpressed in the Outer radial glia 2 population (same in Interneurons 1 and 2) …

NB: I left the “unknown” population ,by purpose, to try to extract information (manually from literature) after extracting the list of marker genes.

## Centering and scaling data matrix


List of top 30 gene markers of “Unknown” cells : FOXD2, NGF, TNNT2, FMOD, OSR1, SIX2, NA, EFEMP1, ACTG2, RAB17, TWIST2, UGT3A2, C7, TFAP2B, SCARA5, OGN, OMD, PRRX2, SLC22A6, MRGPRF, PGR, ITIH2, GJB2, DLK1, ALDH1A2, ISLR, ALDH1A3, C16orf89, SERPIND1, PVALB`
for example : The specific function of this “FOXD2” has not yet been determined, NGF: nerve growth factor, troponin T2, cardiac type …

2.4 Trajectory Analysis:

As cells move between states, they undergo a process of transcriptional re-configuration, with some genes being silenced and others newly activated producing a dynamic repetoire of proteins and metabolites that carry out their work. To explore the developmental trajectories of identified cell subtypes in the human cortex, I will perform a trajectory analysis. This could reveal the differentiation paths and how these subtypes emerge during this developmental window (from pcw16 to pcw24).
the problem here is that the trajectory analysis assumes a linear progression, which might not always reflect the real physiological cellular differentiation (complex process).

To do that, I will use \(monocle3\) which is an algorithm to learn the sequence of gene expression changes each cell must go through as part of a dynamic biological process. Once it has learned the overall “trajectory” of gene expression changes, Monocle can place each cell at its proper position in the trajectory. The workflow used from Link monocle’s guide


Figure 6: Umap of cells colored by the pseudotime

Figure 6: Umap of cells colored by the pseudotime
Figure 6: Umap of cells colored by the pseudotime



As the Figure 6 show, the cells are plotted over a pseudotime going from zeo (blue) to 15 (yellow). Its is supposed that the cells in dark blue are the first cell to appear during differentiation (if we see the Figure 7/8 its : Outer radial glia and Microglia from pcw16 and pcw20) than in intermediate stage in red is Migrating glutamatergic neurons from pcw20 and pcw21 and the final stage will be in red with Interneurons from pcw24.

## Aligning cells from different batches using Batchelor.
## Please remember to cite:
##   Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). 'Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.' Nat. Biotechnol., 36(5), 421-427. doi: 10.1038/nbt.4091
## No preprocess_method specified, and aligned coordinates have been computed previously. Using preprocess_method = 'Aligned'
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

3 Conclusion and perspectives


After all these analyses, we had a global idea about the populations and their dispersion through time, the genes on and off during each stage.
But, we could also perform some supplemental analysis like GO and pathway enrichment analysis on the differentially expressed genes to gain insights into the biological processes and pathways that are enriched in each cell subtype and each stage.
The integration of other data type eg. single-cell ChIP-seq to explore the chromatin dynamics during cell development and to get a broader perspective.
The integration of the spatial transcriptomic data can produce a high-resolution maps of cellular sub-populations in the human cortex.




  • pcw : post conception week